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Abstract. A reformulation of the path length of binary search trees is given in terms 

of permutations, allowing to extend the definition to the instance of words, where the 

^^ ■ letters are obtained by independent geometric random variables (with parameter q). 

In this way, expressions for expectation and variance are obtained which in the limit 

for q — ► 1 are the classical expressions. 



^t 
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The path length p(t) of a binary search tree t satisfies the recursion p(t) = p{b£) + 
p(£r) + |£l| + |£r| where t L and tn are the left resp. right subtree of the root. (|t| 
^ ! denotes the size of the tree t, i. e. the number of nodes.) 

a ■ 

Binary search trees are obtained from permutations. For some background see H, 2, §■ 
Our aim is to rewrite the definition of the path length in terms of permutations, since 
then we are able to obtain g-analogues: This is done by considering words over the 
alphabet {1,2, ... } instead, with probabilities p,pq,pq 2 , • • • , where p + q = 1 (geo- 
metric probabilities). In the limit q — > 1, this model turns into the model of random 
permutations, as equal letters appear with probability and each relative ordering is 
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equally likely. 

For a permutation n = it\ . . . 7r n we define p{ir) by 

Pi 11 ) = \{(j> k)\l<j<k<n, Kj = minfo, . . . ,n k } or n k = minl^-, . . . , 7r fc }}| 



Then p(D) = and, if ix = oTr, then p(ir) = p(a) + p(r) + \a\ + \r\, as pairs with the 
left coordinate in a and the right coordinate in r are definitely not counted. 

But this definition of n can be taken as it is where 7Ti . . . ir n now denotes a word over 
the alphabet {1,2,...}. This will be our starting point. 

We want to point out that our previous paper || contains easier but related parameters. 

In the sequel we want to compute the expectation and the variance of the parameter 
p, for random words of length n. We define random variables 
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1 if ttj = min{7Tj, . . . , ilk} 



otherwise, 

1 if 7Tfc = mm{7ij, 



otherwise, 

"jk ^jk ' -ft'jki 

Njk = (1 — Ljk) ■ (1 — Rjk) 



(The letters L, R, B, N are chosen to indicate left, right, both, not.) 
Then the parameter p may be described as 



P — 2_^ Ljk + Rjk — Bjk 

l<j<k<n 



Now we can introduce the generating function 
p 
q 



^ = (f)" E qH+ '" +ln II [LjkV + R jk v-B jk v + N jk 

l<j<k<n 



il,...,i„>l 



the coefficient of v k in f(v) is the probability that parameter p has value k, assuming 
random words of length n. 

As always, the expected value is obtained via E = /'(l); 



E 



Q n E ^ 1+ '" +ln E &*+**-**) 



il,...,j n >l l<j<fc<™ 



E (f) [ E ^ + " + " + E «" + - +<> - E * 



U-\ Hk 



Rjk — ] 



Bjk—^ 



l<j<k<n Ljk=l 



E 



l<j<fc<n 



' 1 \ fc+l-j 



P" 



j>l 



po 



(2p-p 2 ) E (-)" ' J E^ H1 

l<?'<fc<n »>1 

P( 2 "P) E x _ qfc+ i- 3 - 



i<j'<fc<n 

n + 1 — i 



Pi2- P )Y J - 1 



2<i<n 



Cf 



P(2-p) > — : r n(2-p). 

L-^l 1 — 0* 



KKn 



The terms that would survive the limit q — > 1 are 
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2p J2 (n+l-i)y—-2n, 



l<i<n 

and the limit is 

^ v^ n + 1 — i 
limE = 2 2^ ; 2n = 2(n + l)H n - An, 

l<i<n 

as is of course well known. 

Now we turn to the variance, and this is much harder, since we must first compute the 
second factorial moment, which is obtained by a second derivative; 



P\ \ ~ a — \-i 



E M; £ «"• 



n x 



0/ 

U,...,i„>l 



X 2_^i \L]k + -Rjfc — Bjk) [Llni + -R/m — Bimj 

l<j <k<n,l<l<m<n,(j ,k)^(l,m) 

ii,...,i n >l 

x / \2LjkLi m + zLjkRim — ALjkBi m + BjkBi m j 

l<j <k<n,l<l<m<n,(j,k)^(l,m) 

— o" LL 4- 9" LR — 4" LB + " BB 

(using several symmetries). 

The range A = {1 < j < fc < n, 1 < / < m < n, (j, k) ^ (I, m)} must be split into the 
following 12 disjoint subranges: 

Ai = {1 < j < k < I < m < n}, 

A 2 = {1 < j < I < m < k < n}, 

A 3 = {1 < j < I < k < m < n}, 

A 4 = {1 < j < k = I < m < n}, 

A 5 = {1 < j < I < m = k < n}, 

A 6 = {1 < j = I < k < m < n}, 

A 7 = {1 < I < m < j < k < n}, 

As = {1 < I < j < k < m < n}, 

A 9 = {1 < / < j < m < k < n}, 
Aio = {1 < I < rn = j < k < n}, 
An = {1 < I < j < k = m < n}, 
A12 = {1 < / = j < m < k < n}. 
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And we will have contributions 6 LL , 6 LR , 6 LB , 6 BB , to 2 LL , H LR , 2 LB , H BB , for 
i — 1, . . . , 12, according to the 12 ranges A$. 

Therefore we must compute 48 (not necessarily) different contributions. 

For convenience, we state them as a lemma. 

Lemma 1. The contributions 0f L , Of^, ©f" 8 , ®f B , for i — 1, ... ,12, are given by 



P)LL _ o,ll _ 2 y^ X X 

l<j<k<l<m<n ^ y 

B LL _ B LL _ 2 V^ X X 

U 2 _ Ug _ p ^ 1 _ fc+l-i J _ a m+l-i ' 

l<j<l<m<k<n y y 

0£I = g£I = D 2 V^ 1 1 

l<j<l<k<m<n y " 

0£I = 0" = D 2 V^ 1 1 

w 4 w 10 ^ 2_^ 1 _ nm+l-j I _ qm+l-k ' 

l<j<k=l<m<n 

r,LL r,LL 2 V^ X X 

^5 w ll y /_^ | _ m+ i_j j _ m +l_; ' 

l<j<l<m=k<n 

qll = q ll = y 1_ 



l<j=£<fc<ra<?i 



OM _ aLR _ 2 >p X X 

1 7 ^ Z^ i _ qk+i-j i _ qm+i-i ■ 

„* y I I 

F /-^ 1 _ qk+i-j i _ qm+i-i ■ 



B™ = B™ 



l<j <l<m<k<n 



Q LR = Q LR =p 2 J2 

l<j <l<k<m<n 

l<j <k=l<m<n 



1 



1 



1 



1 



1 



1 _ qm+l-j I _ qm+l-l I _ qm+l-j I _ qk+l-j I _ qm+l-j 

1 111 

+ 



X _ qm+l-j I _ qm+l-k I _ qm+l-j I _ qk+l-j I _ qm+l-j 

B LR = r? V ^ X 

5 F Z-^i I _ qm+l-j i _ qm+l-l ' 

l<j<l<m<n 
QLR = 2 >p * 1 

6 y Z_^ i _ qm+l-j i _ qk+i-j ■ 

l<j<k<m<n 

1O =P Z^ I _ k+l-l ' 

l<l<m=j <k<n 

0)LR _ qLR _ 2 \^ 1 

U ll -^12 ~P 2^ l_ofc+l-I' 



l<l=j <m<k<n 
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&LB _ qLB _ 3 V^ l l 

Wl w 7 P 2^ 1 _ gk+l-j 1 _ Q m+l-l ' 

l<j<k<l<m<n 

r,LB _ 0)LB _ 3 V^ * 1 

^2 - ^8 - P 2^ 1 _ gk+l-j 1 _ q m+l-l ' 

l<j<Z<m<fc<n 

0LB _ P)LB _ 3 V^ 1 X 

^3 - U 9 - P 2^ 1 _ Qm+l-j 1 _ Q m+l-l ' 

B LB = n 3 V X X 

4 F / j -y _ qm+l-j I _ q-m+l-k ' 

l<j<k=l<m<n 

QLB 3 V^ 1 1 

5 r / j y _ m +l-j -y _ qm+l-l ' 

l<j<l<m<n 

P)LB _ 2 V^ X 

U 6 - P 2^ 1 _ gm+l-j ' 

l<j<fc<m<n 

u io - P Z^ ! _ g fc+i-« ' 

l<£<m=j<fc<n 

U H - P Z^ 1 _ q k+l-l ' 

l<£<j<fc=m<n 

v 2_^ 1 _ ^fc+i- 



9 12 -/' Z^ 1 _gfc+l-/- 

l<l=j <m<k<n 



qBB_qBB_ 4 ST^ 1 l 

1 7 ^ Z^ 1 _ gfc+l-j 1 _ a m+l-i ' 

l<j<k<l<m<n 

P)BB _ r,BB _ 4 V^ l l 

l<j<l<m<k<n y y 

BB _ qBB _ 4 V^ 1 

U 3 - U 9 - P 2^ 1 _ qm+l-j ' 

QBB _ qBB _ 3 V^ 1 

U 4 - u 10 - P 2^ 1 _ qm+l-j ' 

l<j<fc=(<ra<n 

e BB = e M = D 3 V l l 

^5 w ll ^ 2_^ j _ qm+l-j I _ ^m+l-Z ' 

l<j<l<m<n 

QBB _ qBB _ 3 V^ L 

U 6 - U 12 - P Z^ 1-0™ 



qm+l-j 
l<j=l<k<m<n 



Proof. The computations are as (or slightly more complicated than) the one for the 
expected value. We don't give more details. □ 



6 HELMUT PRODINGER 

We simplify those sums and write a, = y^-r for convenience. 

Lemma 2. 

= z,p 

2<i,j<n—2,i+j<n 

2p 2 ^ a i a j ( n + 1 ~ J) U - i - !) 



o 2 v^ fn + 2-i-j\ 

2p ^ a.aA 2 1 



2<i<j<n 

+ 2p 2 Y^ aia-jin + 1 - j)(i - 2) 

3<i<j<n 

+ Ap 2 Y Oidjfa + l-j) 

2<i<j<n 

+ 2p Y CLi{i-2)(n + l-i), 



3<i<n 



2<i,j<n-2,i+j<n ^ ' 

2<«<i<n 

+ V J2 aia i (n+l-j)(i-2) 

3<i<j<n 

-2p 2 J2 ^( l ~ 2 \n + l-i) 

4<i<n ^ ' 

+ Ap 2 ^2 CLiUjin + 1 - j) 

2<i<j<n 

+ P 2 Yl ai(*-2)(w + l-i) 



3<i<n 



+ p 5Z ai(i-2)(n + l-i), 



3<i<n 



3»=v e ^(" +2 2 "^0 

2<i,j<n-2,i+j<n ^ ' 

+ 2p 3 ^ OiO^n + l-jOO'-^-l) 

2<i<j<n 

+ 2p 3 Y a i a J (n + 1- j) (i - 2) 

3<i<j<n 

+ 2p 3 Y Oittjin + l-j) 

2<i<j<n 

+ (Sp 2 +p 3 ) J2 <k{i-2)(n + l-i), 



3<Kn 
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2<i,j<n-2,i+j<n ^ 

2<«<i<ra 

2<j<i<n 

4<i<n ^ ' 

+ 4p 3 ^ ai(i-2)(n + l-i). 

3<«<n 



The variance is given by 

V = 2S LL + 2S LR - 4S LB + 2 BB + E - (E) 2 . 
In order to simplify this expression, we note the following formulae: 
Lemma 3. 



□ 



^ ,n+2-i-j 

X n i a j[ 2 



p~ y aid 

2<i,j<n—2,i+j<n 



2p 2 J2 a i a J ( 2 ) ~ p2 J2 ai [ 2 ) (' ~ X ) 



n + 2 — j\ 9 v^ / n + 2 — i 

2 )--^-' 



Ki<n 



Proof. Note that 

11 1 / 1 1 



- 1 



1 - q i 1 - gJ 1 - q^+J \ 1 - g* 1 - g j 
and do some trivial rearrangements. □ 

Lemma 4. 

( z_^ «i(^ + l — ^)) =2 2_. o-iaj{n + 1 — i){n + 1 — j) + YJ a 2 (n + l — i) 2 . 

l<i<n l<*<i<^ l<i<n 

Proof. Obvious. D 

Using these lemmata and numerous simplifications that were partially supported by 
Maple, we can state our main result: 
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Theorem 1. The expectation and the variance of the q-ified path length in words of 
length n, generated by n independent geometric random variables are given by 

„ , x v^ n + 1 — i 

E = p(2-p) J2 !_ i ~n(2-p) 

l<i<n y 

and 



Y = 2p 2 J2 



(n+ 1 - j)(Ai + p(5 - Ai)) 



l<i<j<n v ^ /v * ' 

-p (2-p) ^ — - 



Ki<n 



y^ U+ J / _ 2 + , ^ + 4n _ lg • + 3i 2 + 7 N 



n + 1 — i 

l<i<n 

+ Ap 2 (ni - n + Zi - 1 - i 2 ) + p 3 (-ni + n + 2i 2 - Si 
+ 5pn — 3p 2 n — 2p 3 n. 

The terms in the variance that would survive the limit q — ► 1 are these: 

2 v^ (w + l-j')i 2 v^ ( n + ! -^) 2 v^ (n + 1 - i)(3i - 1) 



D 



1 ^ (\-q l )(\-q>) l ^ (i-q*) 2 ^ l-g* 

l<i<j<n v ^ 7V H ' l<i<n v H ' l<i<n H 

The limit is 

v-^ (n + 1 — j)(j — 1) \-^ (n + 1 — i) 2 \-^ n + 1 — i. 

limV = 8 V ^^ — --4 V v ! - + 2 V —I (3i-l) 

l<i<™ l<j<n l<j<n 

= 4(n + l)n - 8(ra + l)# n + 8n - 4(n + l) 2 H {2) + 8(n + l)if n - An 

+ 3n(n + 1) - 2(n + l)if n + 2n 
= In 2 - A(n + l) 2 H {2) - 2{n + l)# n + 13n. 

This is (of course!) the variance in the classical case. 
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